Skip to content

feat: add iceberg_data library alongside iceberg#631

Merged
wgtmac merged 3 commits intoapache:mainfrom
zhjwpku:split-iceberg-core-data
May 5, 2026
Merged

feat: add iceberg_data library alongside iceberg#631
wgtmac merged 3 commits intoapache:mainfrom
zhjwpku:split-iceberg-core-data

Conversation

@zhjwpku
Copy link
Copy Markdown
Collaborator

@zhjwpku zhjwpku commented Apr 27, 2026

Move data writers, deletes/, and puffin/ into a separate iceberg_data library that links the existing iceberg target. delete_file_index stays in iceberg because manifest_group embeds DeleteFileIndex::Builder with only core dependencies.

  • iceberg — unchanged target name for metadata/planning, expressions, manifests, catalog (incl. in-memory), utilities, file I/O abstractions, and delete_file_index.

  • iceberg_data — data/, deletes/, puffin/; links iceberg.

iceberg_bundle links iceberg_data when the bundle is built. iceberg_rest links iceberg and cpr only.

Copy link
Copy Markdown
Contributor

@MisterRaindrop MisterRaindrop left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Comment thread src/iceberg/deletes/roaring_position_bitmap.h Outdated
Comment thread src/iceberg/deletes/roaring_position_bitmap.h Outdated
@zhjwpku zhjwpku force-pushed the split-iceberg-core-data branch 3 times, most recently from c766bd8 to 889125a Compare May 1, 2026 13:28
Comment thread example/CMakeLists.txt
set(CMAKE_CXX_STANDARD 23)

find_package(iceberg CONFIG REQUIRED)
find_package(iceberg CONFIG REQUIRED COMPONENTS bundle rest)
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't have to be changed, I updated it mainly to test the COMPONENTS keyword and thought it could be a useful reference for others. I will change it back if any objection.

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR splits data-path functionality (writers, deletes, puffin) into a new iceberg_data library while keeping the existing iceberg target focused on core metadata/planning and utilities, and updates build/test wiring accordingly.

Changes:

  • Introduce iceberg_data as a separate library (Meson + CMake) and move data/, deletes/, and puffin/ sources to it.
  • Update unit test build definitions to link against iceberg_data where needed.
  • Adjust packaging/examples/CI scripts for the new component structure and Windows environment detection.

Reviewed changes

Copilot reviewed 12 out of 12 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
src/iceberg/test/meson.build Adds per-test selection to link against iceberg vs iceberg_data.
src/iceberg/test/CMakeLists.txt Adds USE_DATA option for tests to link iceberg_data where required.
src/iceberg/meson.build Creates iceberg_data library target and adjusts dependencies/source lists.
src/iceberg/iceberg_export.h Tweaks template export macro behavior for non-Windows builds.
src/iceberg/iceberg-config.cmake.in Documents new iceberg_data CMake targets/components.
src/iceberg/file_writer.h Exports WriterProperties for cross-library visibility.
src/iceberg/data/meson.build Installs public headers for iceberg/data/*.
src/iceberg/arrow_c_data_guard_internal.h Exports guard classes used across the new library boundary.
src/iceberg/CMakeLists.txt Adds iceberg_data CMake target and wires bundle to depend on it.
example/CMakeLists.txt Requests bundle and rest components explicitly in find_package(iceberg ...).
ci/scripts/build_iceberg.sh Improves Windows detection to include Cygwin.
ci/scripts/build_example.sh Improves Windows detection to include Cygwin.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/iceberg/CMakeLists.txt Outdated
endif()

if(TARGET iceberg_data_static)
target_compile_definitions(iceberg_data_static PRIVATE ICEBERG_STATIC)
Copy link

Copilot AI May 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

iceberg_data_static sets ICEBERG_STATIC as a PRIVATE compile definition, but consumers of the static library also need ICEBERG_STATIC so that ICEBERG_EXPORT in public headers (e.g., iceberg/data/writer.h) doesn’t expand to dllimport on Windows. As-is, linking an external target against iceberg::iceberg_data_static can fail due to incorrect import/export annotations. Consider making ICEBERG_STATIC an INTERFACE/PUBLIC compile definition on iceberg_data_static (potentially gated to WIN32), or switching the data headers to a dedicated ICEBERG_DATA_* export macro that add_iceberg_lib() already defines.

Suggested change
target_compile_definitions(iceberg_data_static PRIVATE ICEBERG_STATIC)
target_compile_definitions(iceberg_data_static
PUBLIC $<$<PLATFORM_ID:Windows>:ICEBERG_STATIC>)

Copilot uses AI. Check for mistakes.
@kamcheungting-db
Copy link
Copy Markdown

Out of curiosity, what is the purpose of creating this iceberg_data

@zhjwpku
Copy link
Copy Markdown
Collaborator Author

zhjwpku commented May 4, 2026

Out of curiosity, what is the purpose of creating this iceberg_data

Check this issue #627, we can have further discussion there.

@kamcheungting-db
Copy link
Copy Markdown

Out of curiosity, what is the purpose of creating this iceberg_data

Check this issue #627, we can have further discussion there.

thank you.

@zhjwpku zhjwpku force-pushed the split-iceberg-core-data branch from 6ae47fd to 190c145 Compare May 4, 2026 07:49
Comment thread src/iceberg/meson.build Outdated
)

iceberg_data_interface_args = []
if get_option('default_library') != 'shared'
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use == 'static' here, like iceberg_rest. With --default-library=both, consumers can link the shared iceberg_data library but still get ICEBERG_DATA_STATIC, so Windows builds won't import the DLL symbols correctly.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fair enough, fixed.

Comment thread src/iceberg/test/CMakeLists.txt Outdated
visit_type_test.cc)

add_iceberg_test(roaring_test SOURCES roaring_test.cc)
add_iceberg_test(roaring_test USE_DATA SOURCES roaring_test.cc)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO, we can remove the executable roaring_test as well as the file roaring_test.cc.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I checked the test file, it's just trivial tests with no Iceberg-related logic, removed.


/// \brief Metadata about a Puffin file.
struct ICEBERG_EXPORT FileMetadata {
struct ICEBERG_DATA_EXPORT FileMetadata {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not quite sure if puffin file metadata (and other associated metadata definitions) belong to the iceberg_data library. I'm fine to go with the current PR and change if necessary in the future.

Comment thread src/iceberg/CMakeLists.txt Outdated
Comment on lines +206 to +208
if(TARGET iceberg_data_static)
target_compile_definitions(iceberg_data_static PRIVATE ICEBERG_STATIC)
endif()
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if(TARGET iceberg_data_static)
target_compile_definitions(iceberg_data_static PRIVATE ICEBERG_STATIC)
endif()

iceberg_data_static has linked iceberg_static so all symbols from the latter should be imported. Let's remove these lines.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Comment on lines +51 to +53
target_link_libraries(${test_name}
PRIVATE "$<IF:$<TARGET_EXISTS:iceberg_data_static>,iceberg_data_static,iceberg_data_shared>"
GTest::gmock_main)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
target_link_libraries(${test_name}
PRIVATE "$<IF:$<TARGET_EXISTS:iceberg_data_static>,iceberg_data_static,iceberg_data_shared>"
GTest::gmock_main)
target_link_libraries(${test_name} PRIVATE iceberg_data_static GTest::gmock_main)

nit: let's fix it either by using static libraries here or fix other branches to add the same detection.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, let me create another PR to make it consistent.

@wgtmac wgtmac merged commit 0db527a into apache:main May 5, 2026
18 of 19 checks passed
@wgtmac
Copy link
Copy Markdown
Member

wgtmac commented May 5, 2026

Thanks @zhjwpku for adding this and @MisterRaindrop @kamcheungting-db for the review!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants